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ABSTRACT 


This paper develops an explicit expression for a compression 
matrix T of smallest possible left dimension k consistent with 
preserving the n-variate normal Bayes assignment of X to a given 
one of a finite number of populations and the k-variate Bayes 
assignment of TX to that population. The Bayes population assignment 
of X and TX are shown to be equivalent for a compression matrix T 
explicitly calculated as a function of the means and covariances 
(known) of the given populations. 
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INTRODUCTION 


In this paper n. will denote an n-variate normal population 
having a priori probability ^>0 and density p_.(x); i=0,l,...,m. 

Using recent results [l] that characterize linear sufficient statistics 
we will develop an explicit expression for a kxn compression (k<n) 
matrix T for which, using the Bayes classification procedure [2] , 
in which costs of misclassification are tacitly assumed equal on all 
classes, X is assigned to IL if and only if TX is assigned to H. . We 
will further demonstrate that k is the smallest integer (<n) for 
which the latter equivalence is valid and that T can be directly 
calculated in terms of the known population means and covariance matrices. 

The applications which motivate the necessity for compressing or 
reducing the size of a data vector is summarized very well in a review 
paper by Laveen Kaval in [3]. Our own interest was motivated by a 
need to reduce computational requirements in a large area crop inven- 
tory project using multidimensional data taken remotely by near earth 
satellites [A] . 

In all that follows and E.. will, respectively, denote the 
mean and covariance matrix of population n.. , i=0,l . ,m. It is well 
known that for each non-singular nxn matrix A and nxl vector a, the 
Bayes assignment of x to n. is equivalent to the Bayes assignment of 
A(x-a) to n*. We will later assume that n 0 =Q and E Q = I. This assump- 
tion Will impose no loss of generality in the results that follow since 

T 

we may set a=n 0 and choose A such that AE Q A =1. 

If the latter transformation of variables is necessary, we will not 
introduce new symbols for the variate A(X-ri 0 ), the densities p^(Ax-n 0 ) 



and their associated means and covariance matrices. Whenever Q is 
an sxn rank (s<n) matrix, we will denote the s-variate normal 
density of Qx by (for population H..) p^(Qx). 

PRINCIPAL RESULTS 


According to [11, let k(<n) be the smallest integer for which 
there exists a linear sufficient statistic (kxn matrix T) for the family 
of probability measures having densities (x) ; 1=0,1, ...» m. The 
results in [1] demonstrate that the sufficiency of T is equivalent 
to the conditions: 

(1) T + Tn-nj 


(2) rKtj-i) - Ej-i 


3=0,1,..., m 


where (») + denotes the generalized inverse of (•). 
Let M be the nx{n+l)m partitioned matrix 

M s [n 1 |n 2 h*‘|n m |s 1 -liJ: 2 -Ij**‘] j: n f I l 


and let M=FG be a full rank decomposition [5] of M, that is; F is nxk, 

G is kx(m+l)m and rank (F) = rank (G) = k. Again, according to[l] and 
the latter, k must be precisely the smallest integer (<n) for which 
a kxn matrix T can be a sufficient statistic for the given family 
of probability measures. 

It is well known [5] that M + =G + F + and hence that MM + =FF + . A 

r 

simple computation reveals that T=F satisfies conditions (1) and (2) 
so that F T is a sufficient statistic (of minimum left dimension) for 
the given family of probability measures. We have the following 
theorem. 



Theorem 1 . Let be an n-variate normal population with a^ 
priori probability n^>0, mean and covariance E..; i=0,l, •••,(!) 
{w1th no =Q t e o =I) and let FG=M5[ ni |n m l E-|-I I I ’ * * I V 11 
be a full rank (=k<n) decomposition of M. Then, the n-variate 
Bayes procedure assigns x to if and only if the k-variate Bayes pro- 
cedure assigns F^x to IT. . Moreover, k is the smallest integer for 
which there exists a kxn compression matrix T preserving the Bayes 
assignment of x and Tx to tt.; 1=0, 1, ..., m 

Proof: Recall that the n-variate Bayes procedure assigns x to 

itj if and only if (x) ; i-0,1,... ,m: if j (with arbitrary 

assignment of x to any of the populations n^for which -rr jpj ( x) = ir^p^fx) }. 

Let R be any (n-k) x n matrix such that C =■ R(I-FF ) has rank 
n-k and note that tt . p . ( x ) > it.p.(x); i=0,l,...,m: i^j is equivalent to 

*J* v * X 

w jP j ( [ x ) > Jx); i-0,1 m: i7j 

For any q=0,l,...,m, the n-variate normal density p n ([^ ]x) has mean 
F x n 

I c and covariance matrix: 


F%F 
LCE q F 


fV 

CE q C T 


Condition (1) implies Crn=e. Condition (2) implies that I-FF commutes 

T T 

with Eq and it follows that CE q C -CC and Ce^F = 0 . We may therefore 
write Pq([£^]x) as the product of the respective k-variate and (n-k)- 
variate densities pq(F^x) and Pq(Cx|F T x), the conditional density of Cx 
given F^x. Since Pq(Cx[F^x)>0 does not depend upon q = 0, 1, ...» m; 

it follows that the n-variate Bayes assignment of x to Hj*, j=0,l m, 

implies the k-variate Bayes assignment F^x to II j. The foregoing arguments 
are reversible and hence the k-variate Bayes assignment of F^s to ITj 
implies the n-variate Bayes assignment of x to II j , completing the proof of 
the equivalence. The minimality of k, in the sense that the n-variate 



and k-variate Bayes assignments of x and F^x are preserved, is a con- 
sequence of the developments preceding the theorem. 

CONCLUDING REMARKS 

Clearly the theorem is valid if there is at least one population 
with mean 0 and covariance I, in which case we would label that 
population n Q . If this is not the case, one would choose some 
population, say tt , and perform the change of variables x->A(x-nJ 

M H 

T 

where AtqA =1 prior to application of the theorem. The appropriate 
statistic for compression, in terms of the original variates, would 
then be T=F T A~^. 

These results completely characterize the nature of data 
compression for the Bayes classification procedure in the sense 
that k is the smallest allowable data compression dimension consis- 
tent with preserving Bayes population assignment and, moreover, the 
theorem provides an explicit expression for the compression matrix T 
that depends only upon the known population means and covariances. 

The statistic T=F^ given by the theorem is by no means unique (e.g.» 

,for any non singular kxk matrix B, T=BF will do! It is also true 
that there may be more efficient methods for calculating the 
statistic T (yet to be determined) than the method of fill rank 
decomposition of M. 

It should be noted that the matrix M has an "excellent chance" 
of having rank equal to n. Even in the case of two populations (m=2), 

there may well be n linearly independent columns among the 2(n+l) columns 
of M and, therefore, no integer k<n and kxn rank k compression matrix T 
preserving the Bayes assignment of x and Tx. 


There has been extensive work [61 ,[71, [8], [9] ,[101,1111, (12), [13], 
on determination of compression matrices (of a given rank) based upon 
criteria that, generally, attempt to describe the relative (to the 
variate x) "information content" in the variate Tx (e.g., divergence, 
Bhattacharyya distance, Chernoff bound, principal components, Wilks 
scatter, etc.) While these criteria provide base* for calculating 
compression matrices T, they provide little or no means for determining 
the degradation in probability of misclassification or sensitivity to 
population assignments. 

In sampling situation one may choose to replace the columns of the 
matrix M by their estimates, that is nj by Xj and Ej by Sj. The matrix 
defined by the estimate suggest a compression technique based on the selec- 
tion of a k dimensional hyperplane which in some sense best fits the 
range space of matrix 



where 

x q -0 and S Q =I. 

We feel that the results in this paper shed some light upon the 
subject. In future work we intend to extend these results and the results 
of [1] to a related concept of an "almost sufficient" statistic. 
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